-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema inference parameterized types #32757
base: master
Are you sure you want to change the base?
Schema inference parameterized types #32757
Conversation
97f7cde
to
6e8fe64
Compare
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
d779bd9
to
a1bb67d
Compare
1957eea
to
1366db3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionality looks sound and tests look great
Left some nits
There's few changes to public, non-internal, method signatures that are breaking though. Might make sense to overload these methods just in case (would also narrow down the scope of this PR)
|
||
if (type instanceof TypeVariable) { | ||
TypeVariable<?> typeVariable = (TypeVariable<?>) type; | ||
return Preconditions.checkArgumentNotNull(boundTypes.get(typeVariable)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to fall back on type
if boundTypes
doesn't contain it for whatever reason? Instead of failing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to, as that would simply be hiding a bug which might cause other problems (often caused by a call to getType() instead of getGenericType()).
Type[] typeArguments = parameterizedType.getActualTypeArguments(); | ||
; | ||
if (typeArguments.length != typeVariables.length) { | ||
throw new RuntimeException("Unmatching arguments lengths"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: include the typeDescriptor in the error message for easier debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -243,4 +255,45 @@ public static TypeDescriptor boxIfPrimitive(TypeDescriptor typeDescriptor) { | |||
? TypeDescriptor.of(Primitives.wrap(typeDescriptor.getRawType())) | |||
: typeDescriptor; | |||
} | |||
|
|||
public static <T> Map<Type, Type> getAllBoundTypes(TypeDescriptor<T> typeDescriptor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment/javadoc for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public static FieldValueTypeInformation forField( | ||
Field field, int index, Map<Type, Type> boundTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a breaking change to a public method (not marked with @Internal
). Are we okay with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with some other changes to method signatures in this class: forGetter()
and forSetter()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think so. This is a logically an internal class, which was only marked public because we needed to access it from other packages (e.g. the protobuf package). The fact that it was not marked @internal was likely a mistake, and I think it's fine to change it in this case.
@@ -161,7 +162,8 @@ private static boolean matchConstructor( | |||
// Verify that constructor parameters match (name and type) the inferred schema. | |||
for (Parameter parameter : constructor.getParameters()) { | |||
FieldValueTypeInformation type = typeMap.get(parameter.getName()); | |||
if (type == null || type.getRawType() != parameter.getType()) { | |||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public static Schema schemaFromPojoClass( | ||
TypeDescriptor<?> typeDescriptor, FieldValueTypeSupplier fieldValueTypeSupplier) { | ||
return StaticSchemaInference.schemaFromClass(typeDescriptor, fieldValueTypeSupplier); | ||
TypeDescriptor<?> typeDescriptor, | ||
FieldValueTypeSupplier fieldValueTypeSupplier, | ||
Map<Type, Type> boundTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: breaking change to a public method's signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Internal annotation. This was never meant to be used outside of Beam internals.
.filter(m -> !Modifier.isProtected(m.getModifiers())) | ||
.filter(m -> !Modifier.isStatic(m.getModifiers())) | ||
.forEach(methods::add); | ||
c = c.getSuperclass(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this functionality change to get all methods in the superclass chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an existing bug that was uncovered by the new unit tests I added in this PR. JavaFieldSchema (for POJOs) finds all fields in superclasses and adds them to the schema, but we didn't do that for JavaBeans. I changed this to make things consistent and fix the broken tests.
BTW related to your other comments - we should probably mark all of these. helper classes as Internal. They were never meant for external users to use them, and this change (among others) is at least a functionality change
TypeDescriptor<?> typeDescriptor, | ||
FieldValueTypeSupplier fieldValueTypeSupplier, | ||
Map<Type, Type> boundTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another breaking method signature change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with fieldFromType()
method below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Internal annotation. the fact that it wasn't so marked before was an oversight.
@@ -625,4 +626,41 @@ public void testSetterConstructionWithRenamedFields() throws NoSuchSchemaExcepti | |||
assertEquals( | |||
registry.getFromRowFunction(BeanWithCaseFormat.class).apply(row), beanWithCaseFormat); | |||
} | |||
|
|||
@Test | |||
public void testRegisterBeamWithTypeParameter() throws NoSuchSchemaException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
public void testRegisterBeamWithTypeParameter() throws NoSuchSchemaException { | |
public void testRegisterBeanWithTypeParameter() throws NoSuchSchemaException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.addInt16Field("value5") | ||
.build(); | ||
assertTrue(expectedSchema.equivalent(schema)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests for collection, map, nested types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Beam automatically infers schema for common Java types: POJOs, JavaBean, AutoValue, AVRO, among others. In addition for it's use with schema transforms, this also provides a simple, efficient way of using these types in PCollections without needing to construct a Coder. A frequent complaint has been that this inference did not work in the presence of generic type parameters. E.g. this means that while Beam can handle PCollections of this class:
@DefaultSchema(JavaFieldSchema.class)
class MyType {
String field1;
}
PCollection myTypes = read();
Schema inference would fail for this:
@DefaultSchema(JavaFieldSchema.class)
class MyType<T> {
T field1;
}
PCollection<MyType> myTypes = read();
This PR adds support for generic type parameters to schema inference for common types, addressing the above issue.